A space-structure based dissimilarity measure for categorical data
نویسندگان
چکیده
The development of analysis methods for categorical data begun in 90's decade, and it has been booming the last years. On other hand, performance many these depends on used metric. Therefore, determining a dissimilarity measure is one most attractive recent challenges mining problems. However, several similarity/dissimilarity measures proposed literature have drawbacks due to high computational cost, or poor performance. For this reason, we propose new distance metric data. We call it: Weighted pairing (W-P) based feature space-structure, where weights are understood like degree contribution an attribute compact cluster structure. W-P was evaluated unsupervised learning framework terms quality index. test six real datasets downloaded from public UCI repository, make comparison with (DM3) method hamming (H-SBI). Results show that our proposal outperforms DM3 H-SBI different experimental configurations. Also, achieves highest rand index values better clustering discriminant than methods.
منابع مشابه
An association-based dissimilarity measure for categorical data
In this paper, we propose a novel method to measure the dissimilarity of categorical data. The key idea is to consider the dissimilarity between two categorical values of an attribute as a combination of dissimilarities between the conditional probability distributions of other attributes given these two values. Experiments with real data show that our dissimilarity estimation method improves t...
متن کاملImproved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure
K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency...
متن کاملExtending k-Representative Clustering Algorithm with an Information Theoretic-based Dissimilarity Measure for Categorical Objects
This paper aims at introducing a new dissimilarity measure for categorical objects into an extension of k-representative algorithm for clustering categorical data. Basically, the proposed dissimilarity measure is based on an information theoretic definition of similarity introduced by Lin [15] that considers the amount of information of two values in the domain set. In order to demonstrate the ...
متن کاملA New Symbolic Dissimilarity Measure for Multivalued Data Type and Novel Dissimilarity Approximation Techniques
In this paper a new statistical measure for estimating the degree of dissimilarity between two symbolic objects whose features are multivalued symbolic data type is proposed. In addition two new simple representation techniques viz., interval type and magnitude type for the computed dissimilarity between the symbolic objects are introduced. The dissimilarity matrices obtained are not necessaril...
متن کاملFeature-Based Dissimilarity Space Classification
General dissimilarity-based learning approaches have been proposed for dissimilarity data sets [1,2]. They often arise in problems in which direct comparisons of objects are made by computing pairwise distances between images, spectra, graphs or strings. Dissimilarity-based classifiers can also be defined in vector spaces [3]. A large comparative study has not been undertaken so far. This paper...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Electrical and Computer Engineering
سال: 2021
ISSN: ['2088-8708']
DOI: https://doi.org/10.11591/ijece.v11i1.pp620-627